Shotgun Metagenomic Data Analysis ◾ 311
allow more accurate taxonomic group assignment. There are several programs for assem-
bly-free classification and profiling of microbial communities in metagenomic samples.
Kaiju [3] uses taxonomy and NCBI refseq databases to find maximum matches to the reads
on the protein-level using the Burrows–Wheeler transform (BWT). CLARK [4] (CLAssifier
based on Reduced K-mers) creates a large index of k-mers of all target sequences and then
it removes the common ones among targets so that each target is described by unique
k-mers, which are used for taxonomic classification. Kraken [5] creates k-mers from the
reads and then it builds taxonomy trees that help discriminate closely related microbes
using classification tree and path. Those programs are just examples and there are others
with different algorithms. Centrifuge [6] is a rapid classifier that requires a little memory
and a relatively smaller index (only 5.8 GB for bacterial, viral, and human genomes) on
desktop computers compared to others. Centrifuge uses an indexing system that is based
on BWT and the Ferragina–Manzini (FM) index.
Most taxonomy classifiers of the metagenomic data use genomic database of known spe-
cies to construct an index and then use that index to assign taxa to the metagenomic reads.
The majority of the classifiers require a large storage space for database files and a large
memory for indexing and classification process. Kaiju and Kraken require a lot of memory
(around 128GB–512GB). Therefore, we recommend using these classifiers only if you have
enough computational resources. To use any of these classifiers, you need to download and
build an index and then to perform the classification.
Kaiju installation instructions are available at “https://github.com/bioinformatics-cen-
tre/kaiju”. You can install it by running the following command:
git clone https://github.com/bioinformatics-centre/kaiju.git
cd kaiju/src
make
Then, you need to add its path by adding the following to the “.bashrc” file. You need to
replace YOUR_PATH with the program path.
export PATH=”YOUR_PATH/kaiju/bin”:$PATH
You must restart the terminal or use “source ~/.bashrc” to make the change active. Run
“kaiju” command to check if it has been installed.
Before using kaiju, you need to download the refseq database from the NCBI or you
can download it from the kaiju website at “https://kaiju.binf.ku.dk/server”. To download it
from the NCBI database, use the following:
mkdir kaijudb
cd kaijudb
kaiju-makedb -s refseq
The download will take a long time and a large storage space. When the database has
been downloaded, make sure that “nodes.dmp”, “kaiju_db_refseq.fmi”, and “names.
dmp” files are present in the “kaijudb” directory. You may need to decompress